Agenda

  • Status update for the DAUF project
  • New ABM 2025 with overall result and updates
  • New beta version of subject based KTH Research Information app
  • News related to data curation - new version of DiVA coming
  • OpenAlex on Sunet
  • Future directions and your questions and feedback

About the DAUF project

  • Creating services and tools for presentation of research information data, improved data flows and connecting data sources within KTH
  • Agile model with 2 week sprints
  • Collaboration between KTH Library, RSO and ITA
  • Part of IT portfolio for Research (Delportfölj forskning), within in the object “Publicering och analys”

Status and progress update

Progress overview - since last demo

  • This years version of ABM was released about a week ago

  • Recently released beta version of topics based KTH Research Information

  • POC for the KTH Indicators dashboard based on consolidated indicators collected from across KTH.

  • Tests and prep for GDP 2.0 (Gemensamma dataprojektet) - new standard for Swedish project data

  • The open data source OpenAlex (used in the Open Leiden Ranking) has been evaluated and efforts have started to map data in DiVA against OpenAlex and work has begun to enrich DiVA from this source.

Annual Bibliometric Monitoring 2025

Changes in ABM 2024

  • More interactive graphs (plotly)
  • Changed OA graph
  • Enabled selection of number of rows for co-publication tables
  • Some cosmetic changes

Brief ABM results for KTH

  • Number of publications is consistently declining
  • Citations indicators are slightly increasing
  • Journal indicators on a stable level
  • Small changes in co-publication patterns
  • Share of Open Access publications is steadily increasing

KTH Resarch Information - Topics (beta)

Data curation and DiVA

  • Harvest of DiVA through OAI-PMH –> database
  • Can curate and annotate connected to this database
  • Preperation for new DiVA
  • Ambition to decouple importing and curation from DiVA
  • Preparation to use APIs to communicate with DiVA

DiVA curation and stats

  • some summary data?

DiVA harvesting

The DAUF project now harvests DiVA publication data using the OAI-PMH protocol which regularly updates a single file duckdb database, openly available from object storage:

https://data.bibliometrics.lib.kth.se/kthcorpus/oai.db

The database with the harvested information is currently about 4.4 GB large.It is reqularly updated and contains MODS and JSON representations of “all-kth” DiVA records.

Swedish bibliometric resource based on OpenAlex

X

  • Ambition at KTH to track visions and goals using indicators
  • Project within Strategisk verksamhetsanalys
  • Workshops to align avaliable data with goals
  • Relatively manual data collection process from scattered systems at KTH
  • Indicator report + beta dashboard for testing purposes
  • The intended user group is KTH leadership and it enables comparing indicators across schools

Work with KTH indicators & dashboard POC

Efforts at other universities

Workshop within BenchTech network

  • Members: TU Münich, ETH, EPFL, Virginia Tech, MIT, TU Delft, HKUST, DTU, PoliMi
  • Theme this meeting on institutional research, strategic planning and data insights within university administration

Examples:

  • Several presentations on modelling of student success and predictions on admissions and drop-off
  • Modelling employment outcomes (PoliMi)
  • Semantic course mapping - find overlap/synergies (ETH)
  • Iterative development of gender diversity dashboard (TU Delft)

Efforts at other universities - Cont’d

Take home messages:

  • Most universities have a more unified data handling than KTH
  • Often a mix of tools for regular reporting and dashboards and ad hoc-analysis
    All money spend on people, not software” - reason for open tools
  • Sometimes frustrating dichotomy between internal processes (slow) vs data and needs for analysis (fast)
  • Less emphasis on univ. rankings - more on data flows and analytics

Data Curation

Data infrastructure overview

Object storage (S3)

General Dataflow

+--------------------------------+
|                                |
|          Data Sources          |
|                                |
+--------------------------------+
                 |                
  Clean / Crosscheck / Transform  
                 v                
+--------------------------------+
|                                |
|          Curated Data          |
|                                |
+--------------------------------+
                 |                
           Write / POST           
                 v                
+--------------------------------+
|                                |
|     Object Storage (minio)     |
|                                |
+--------------------------------+
                 |                
            Read / GET            
                 v                
+--------------------------------+
|                                |
|     Data Consumer / Client     |
|                                |
+--------------------------------+

GDP

GDP (Gemensamma data för projekt) is an effort of a number of Swedish research funders to create a common data model for project data. The five funding agencies Energimyndigheten, Formas, Forte, Vetenskapsrådet and Vinnova is developing a standard which enables sharing of open data about fundings and related information.

The standard is developed in cooperation with a reference group including universities and other organisations within the university sector, KTH is a participant in the reference group.

GDP data mobilization

OpenAlex

  • Research outputs from KTH in OpenAlex - We have started to evaluate OpenAlex as a data source…

    • Comparisons against DiVA, WoS, Scopus, BIBMET

    • Development of search criteria to capture all KTH publications

    • R package has been developed with a client - https://github.com/KTH-Library/openalex

    • Discussions with SUNET and other universities about common resource

DiVA harvesting

The DAUF project now harvests DiVA publication data using the OAI-PMH protocol which regularly updates a single file duckdb database, openly available from object storage:

https://data.bibliometrics.lib.kth.se/kthcorpus/oai.db

The database with the harvested information is currently about 4.4 GB large.It is reqularly updated and contains MODS and JSON representations of “all-kth” DiVA records.

Future work and discussion

Future work and directions

  • x

Related activities

  • KTH Cris/Rims

  • KTH Insights/datastyrning (MS Fabric/Power BI)

Questions and Answers

Please provide your input in chat or verbally.

  • Questions, suggestions or comments?

If you prefer to give your feedback later or come up with questions after this demo, you are always welcome to email us at biblioteket@kth.se.

Thank you for attending!